A method to predict overall food preferences

Most natural ecosystems contain animals feeding on many different types of food, but it is difficult to predict what will be eaten when food availabilities change. We present a method that estimates food preference over many study sites, even when number of food types vary widely from site to site. Sampling variation is estimated using bootstrapping. We test the precision and accuracy of this method using computer simulations that show the effects of overall number of food types, number of sites, and proportion of missing prey items per site. Accuracy is greater with fewer missing prey types, more prey types and more sites, and is affected by the number of sites more than the number of prey types. We present a case study using lion (Panthera leo) feeding data and show that preference vs prey size follows a bell-curve. Using just two estimated parameters, this curve can be used as a general way to describe predator feeding patterns. Our method can be used to: test hypotheses about what factors affect prey selection, predict preferences in new sites, and estimate overall prey consumed in new sites.


Introduction
Most natural ecosystems contain animals feeding on many different types of food, but it is difficult to predict what will be eaten when food availabilities change. Ecologists face this issue all the time. For example, if we introduce a biocontrol agent then will it escape regulation by predators? Or if climate change shifts the range of a plant species then how will that affect feeding on other plants eaten by a common herbivore? Or if an invasive species appears then how will its predators respond? We need to be able to predict food preference for any combination of food types.
There have also been some excellent studies looking at how dietary preference varies with factors such as amounts of food [7,13], seasonality [14,15] or habitat [16,17]. However, these a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 studies have all been limited to narrow conditions. It has been difficult to predict on a large scale how preference varies with density or habitat or season, because it is difficult to disentangle effects of those factors vs the effects of other food types. One of us (MWH) has simply averaged preferences over many sites, assuming that large sample sizes from diverse food communities across large geographic areas will account for the challenges of comparing preferences between diverse communities [18][19][20].
One main reason for this general paucity in predicting preferences is methodology. While there is a mature field of study about ways to estimate prey preference in one study site [21][22][23][24][25], there is no method to combine preference estimates from many study sites and then predict preference in a new study site that has a unique combination of prey types.
The difficulty in combining prey preferences over many sites arises from the fact that all measures of preference are relative to the prey items present at individual sites. Just adding or subtracting one prey type may completely change preferences for the other prey; thus, it is difficult to combine preferences among sites that have different types of prey.
For example, suppose a predator prefers 4 prey items in this order A>B>C>D. In a site containing only prey items C & D, the predator would select for C. In a site containing prey items A, B, C, the predator would select against C. It is not known how to combine data from those 2 sites to predict selection when all prey are present. One would estimate that both C and B have intermediate selection values but it is not known how to estimate overall preference for B vs C. The problem is much more difficult when 20 prey species exist in 40 different sites.
Previous studies have calculated prey preference values for each prey type within each study site, then averaged these individual preference values over all study sites for each prey type [26,27]. The problem with this is that prey preference estimates within one study site are relative to the prey types present and thus cannot be directly combined with other study sites. Johnson [28] developed a method to combine preferences estimates, but rather than estimating a specific preference, the method estimated a rank for each prey item.
We present a method that estimates overall prey preference by combining data over many study sites, even when not all sites contain all prey items. Our method adjusts for the relative nature of preferences at individual study sites, allowing one to predict feeding preferences in new study sites, or when prey items change. This method also produces estimates of sampling variation, and thus, confidence intervals for preference estimates. We show how to increase precision by weighting the estimates by both prey densities and sampling effort. We then illustrate the method with a simple numerical example and test the method's accuracy and precision using simulations. Finally, we show an example application using data from lion (Panthera leo) feeding.

The proposed method
The aim is to estimate overall feeding preference by one species of animal, based on data from more than one study site. Since our method is based on iteration, we will call this the Iterative Preference Averaging (IPA) method. Note that, following convention, we refer to foods as "prey", and the feeder as the "predator". However, our method applies to all types of animals, not just predators, and the selected items could be vegetation, or even non-food items, such as types of habitat. All analyses were carried out using the Wolfram Language [29], and R code for the final method is given in Nams & Garnett [30]. See Fig 1 for an overview. (e.g. transect counts [31], capture-recapture [32], or track counting [33]), and do not have to be absolute (i.e. one can use indices of abundance). Similarly, the measures of prey consumption can originate by any means (e.g. DNA analysis of stomach contents [34], direct counts of the number of kills [35], or hair identification from scat contents [36]). Prey abundance and consumption estimates can be obtained by different methods at each site, as long as the same methods are used at each site. Thus, IPA is a robust method that can be used for meta-analyses combining results of many studies.

Details
Let m = number of sites, n = overall number of prey items, d ij = density of prey item j at site i e ij = amounts of prey item j consumed at site i.
Step (i). At each site we estimate within-site preferences for those prey that are available: Note that the / ij all sum to 1 for each site, and j is summed over all prey types present in site i, (not all prey are available at each site). This preference estimate is the Manly's / [22,23].
We want to estimate: p j = overall prey preference for prey item j. This is equivalent to an overall / j when all prey types are present.
We explain the procedure with an example. Suppose we have m = 4 sites and n = 6 prey types. The within-site preferences (/) are given by the following matrix, where columns are prey types and rows are sites. The dashes show when the prey type is not present at that site.

ð2Þ
Step (ii). We are estimating overall prey preferences iteratively. We start with some initial values of overall prey preference (p i values), and then update these at each iteration k. For the first iteration we choose equal values that sum to 1, and for the k th iteration, we use the averages summed over all sites.
Step (iii). The α values at each site initially sum to 1. These are then scaled to the values you would expect if all prey were at that site-this step is the key to the whole method. This scaling is done because preferences cannot be simply averaged over sites, because at each site they are relative to each other-i.e. they sum to 1, even though some prey are missing. Thus, in order to combine them, we rescale the preferences so that they would sum to 1 if all prey items were present.
We calculate this scaling constant by the following. Since all prey items are not present, we replace the missing prey items with the overall prey preference estimates. Then we choose the constant so that the new preferences sum to one.
Letp �jk = the estimate of prey preference for prey type j, averaged over all sites (i.e. overall preference), for the k th iteration of the estimation process, p ijk = the estimate of prey preference for prey type j, at site i (i.e. within-site preference), for the k th iteration of the estimation process, P k = the matrix of all preference estimates for each prey type at each site, for the k th iteration of the estimation process, p k = the vector of overall estimates of prey preference, for the k th iteration of the estimation process, p = the final vector of overall estimates of prey preference, A i = the set of prey items that are available in site i, M i = the set of prey items that are missing in site i, c i = the constant that rescales all preferences These new preferences in each site all sum to 1-i.e.: X If we re-arrange Eq (3), noting that the / ij sum to one, and solve for c i , we get that the scaling constant for each site is For example, in site 1 prey types {1,2,4,5} are present, and thus all of the preference values in site 1 are scaled by: Step (iv). The missing values in matrix (6) are replaced by the current estimates p jk , giving: ð7Þ This matrix givesp ijk ; the estimates for the k th iteration for within-site preferences for each prey type at each site. Note that each row now sums to 1.
Step (v). The overall prey preferences for the next iteration are estimated by the means of the columns in matrix (7). E.g. the overall preference for prey item 1, for the next iteration would be:p Step (vi). Compare the new estimate for overall preference to the one used in step (ii). If there is a change in the estimates, then repeat steps (ii)-(v) with the same initial table of preferences but with the new vector of overall prey preferences. The result is a set of overall preference estimates. These preferences range from 0 to 1 and sum to 1, and the important aspect is their values relative to each other, not their absolute values.
Step (vii). Ecological estimates should always have a measure of their error [37]. This can be estimated using bootstrapping [38], treating each site as an independent sample. Briefly, this would involve taking a random number of m rows (sampled with replacement) from the matrix in Eq (2) and then estimating preferences using IPA. This would be repeated many times (typically 1,000-10,000) [38], and the mean and standard deviation values calculated from the random sample estimates. After each bootstrapping sample, the preferences should be transformed by arcsine(2 x − 1) (since prey preferences represent a proportion, and proportions are known to have variances that depend on the mean). Then means, variances, and confidence intervals would be estimated. The means and confidence intervals should be backtransformed. Note that bootstrapping is not a replacement for adequate sampling-i.e. since the sampling units are sites, using very few sites will result in low precisions (wide confidence intervals).
Preliminary simulations showed that using this transformation significantly increased accuracy of the estimates.

Weighting
These estimates can be improved by appropriate weighting, since variances among prey preference estimates are not equal. Typically these variances are not known, however they are decreased by sampling more predators and in areas of higher prey density. For example, the preference estimate would be very variable for rare prey types. Thus we can use inverse-variance weighting [39] to minimize variation of the overall preference estimates, by weighting by prey consumption effort and/or density. Note that this can only be done when prey consumption and/or density are estimated by the same methods across all sites.
Prey consumption effort weighting is carried out as follows. For each prey type the precision of overall preference would be affected by the total number of feeding samples collected at each site. Let: f i = sample size used to estimate prey consumption-e.g. number of individuals eaten, number of scats, etc.
Weighting Eq (8) by feeding sampling effort gives: Density weighting is carried out as follows. Let A i = the set of prey items that are available in site i, and n i = number of prey types present at site i-thus, n i = |Ai|.
Since not all prey types are at each site, there are some missing values in the densities d ij . We handle them by imputation-i.e. by substituting the mean density of all other prey at that site. Thus, if prey type s is not present at site i, then for it's density we use: Weighting Eq (8) by relative prey densities gives an estimate for overall preference for prey type 1 for the next iteration of:p Weighting by both prey consumption effort and prey densities is carried out by weighting Eq (8) as follows:p

A numerical example
In this toy example, there are 6 prey types at 4 sites.
Step (i). The values in each cell are within-site preferences-note that rows sum to 1 for each site. Step (ii). We start with initial estimates of equal values of overall preference for the first iteration:p Step (iii). We rescale the / ij values by multiplying (for each site) by Eq (4): to get: Step (iv). We replace the missing values by the overall preferences from Eq (14): Step (v). We estimate new overall values of overall preference by the mean of the values at each site. Thus overall preferences for all prey types at iteration #2 are: For example, the overall preference for prey type 1 (0.197) is the mean of the values in column 1 in matrix (17).
Repeating steps (ii)-(iv)\ Step (v).p Repeating steps (ii)-(iv) Step (v).p The overall preferences estimates show little changes between the 3 rd and 4 th iteration (Eqs (20) vs (22). Thus, the final estimate for overall preferences for each prey type is given by Eq (22).

Methods
We tested the accuracy and precision of IPA using simulations. We varied the number of prey types, number of sites and the proportions of prey types missing per site. For each simulation, overall preferences (p i ) were generated using a uniform distribution. Then mean prey density for each site was generated using a normal distribution with a mean of 100 and a coefficient of variation of 0.5, truncated to a minimum value of 5. Then the density of each prey type within each site was generated using a normal distribution with a mean of the site prey density and a coefficient of variation of 0.5. "Preference" was treated as the probability of choosing a prey item each time it was encountered. Thus the number of prey consumed for each prey type within each site was generated using a binomial distribution with a mean of p i and an n of prey density. Finally, sites with incomplete collections of prey were simulated by using a binomial random number generator to randomly drop prey types from each site.
We then used bootstrapping to estimate confidence intervals with 100 bootstrapping samples. Confidence intervals were estimated using the bias-adjusted Studentized bootstrap [40].
We compared the IPA estimates to simply averaging preferences over all study sites (we will call this "Averaged Preference"). We calculated those indices for prey types at each site, and then calculated the mean for each prey type over all sites (while ignoring the missing prey types at each site).

Results
IPA gave more accurate and precise estimates of overall prey preference compared to the averaged preference. For example, Fig 2 shows the results of a simulation with 100 sites, 6 prey types, 50% of prey types present (varying from 10-90%) at each site, and 100 replicates. The averaged preference overestimates preference when preference is low and underestimates when preference is high. The lower precision and accuracy of the averaged estimates show the difficulty of estimating overall preference without considering missing prey types at each site.
The IPA estimates are more precise when they are weighted by both prey densities and number of feeding samples for most combinations of parameters (Fig 3). Confidence intervals for weighted estimates are about 75% narrower than those for unweighted estimates. Weighting is more effective with a large number of sites. The proportion of missing prey types, the number of prey types and number of sites interact to affect precision. Precision is greater with fewer missing prey types, more prey types and more sites.
Accuracy of bootstrapped estimates of confidence intervals varies with prey characteristics. The proportion of missing prey types, and the number of prey types and sites, all interact to affect accuracy. Generally, accuracy is greater with fewer missing prey types, more prey types, and more sites, for most combinations of parameters (Fig 4). There is little effect of weighting on accuracy. When there are more than 20 prey types, then bootstrapped confidence intervals are slightly too narrow-e.g. when there are less then 0.3 of prey types missing in each site, estimated 95% confidence intervals are actually like 90%.

Case study
We illustrate the IPA method with lion feeding data [41] that includes 56 sites and 42 prey species. The high proportion of absent species at each site (0.7) should decrease accuracy and precision of the estimators, but the high number of sites and species should increase precision (Fig 3), and the high number of sites should increase accuracy (Fig 4). We estimated the overall preference of each prey type using both weighted and non-weighted estimators. To display the results, we ranked them according to prey weight and then fitted the general bell-shaped where k 1 , k 2 , k 3 = fitted parameters pref = overall prey preference estimated by IPA wt = log(prey weight).
Since the preferences are relative to each other, we scaled them so that the maximum function value would be 1-i.e. the preferences are all scaled relative to the maximum. We did not estimate the weighted IPA estimates because prey density and consumption surveys were not carried out in the same manner among sites.
Analysing overall preference by prey weights in this way yields several biological insights. First, we can describe feeding preferences by just using two parameters, representing the mean (k 1 ) and standard deviation of the feeding curve (k 2 ) (NB: not k 3 , since this is the same for all values of preference). k 1 is a measure of average size of prey eaten, and k 2 is a measure of feeding specialization (the width of the curve). If other predators fit similar curves, we can use those parameters to compare different predators and to estimate competition among predators living in the same system. Second, we can estimate prey consumption by predators in systems with many prey types. Typically, potential prey consumption has been measured by classifying potential prey as either preferred vs non-preferred [42]-this assumes that the only preference choices are completely for or against. We can use this fitted curve to scale densities of each prey type according to preference, in order to get realistic estimates of prey consumption.  Finally, we can identify prey types that merit more research. If overall preference for some prey type significantly deviates from the preference vs size curve, this means that the predator selects for that prey using other properties than just prey size. For example, lions prefer wildebeest (Connochaetes sp) more than any other prey species (Fig 5), and the confidence intervals for preference are outside of the preference vs size curve, showing that wildebeest are preyed upon more often than expected simply due to their size. This is supported by results showing that that wildebeest selection is also affected by rainfall [43].

Sampling issues
There is some flexibility in how input data are gathered. The data input to IPA are prey abundances and consumption. But, because the method uses relative preferences, these data do not have to be estimated in the same way among sites-just the same within each site. Thus, one could estimate relative consumption based on scat analysis at one site and stomach contents at another site.
However, using the weighting when estimating sampling variation does somewhat limit application. IPA can use two kinds of weighting: based on prey densities and the total number of feeding samples in a site. Both types of sampling have to be carried out in the same way for all sites. Lion prey preferences show a sigmoidal relationship based on prey weight. Prey preference is estimated for lion prey items from a series of study sites. Each dot represents one prey type and bars represent 95% confidence intervals. The solid red line represents a best-fit curve for a symmetrical curve of estimated relative overall preference vs log(prey weight) and the shaded areas represent 95% mean prediction bands.
https://doi.org/10.1371/journal.pone.0268520.g005 IPA overall preferences may need to be scaled appropriately, depending on how they are used. Overall preferences are relative to each other and sum to one. But if one wants to predict preferences in a new site with a more limited set of prey types, then those preferences no longer sum to one. Thus overall preferences need to be scaled according to the prey types available at that site, as follows: If A = the set of prey items that are available in the new site, then the predicted preferences for prey type j in that study site are:p For example, suppose that there are 10 species, and overall preferences are estimated to be: The preferences need to be scaled differently for consumption estimation. In order to estimate overall prey consumption, consumption = preference × density, and thus the maximum possible preference would be 1. If you assume that the most preferred prey in that site is effectively always chosen, then you would scale all the preferences by dividing by the maximum value. Using the above example, we would divide the values for those species from (25), by the maximum, 0.2, and get predicted preferences in the new site as: To estimate prey consumption, one would multiply the values in (27) by prey densities.
2. "No preference" for Manly's / is 1/(# of prey types), while for Jacobs' index it is 0. Consequently, for Manly's / the range of prey preferences for avoidance is smaller than for selection whereas for Jacobs' the range of prey preferences for avoidance vs selection is the same.
3. Jacob's index is affected by relative prey densities when there are more then two prey types [24,45]-thus, one can only compare preferences in different situations if prey densities are the same. Manly's / is independent of prey densities.
4. The total of Manly's / for all prey types is 1 but there is no constant total for Jacobs' index.
5. All preference indices are relative. If preferences are estimated for a large suite of prey items, then one cannot just apply those values directly in a different situation with fewer prey. However Manly's / can be re-normalized to take into account the dropping of prey items IPA uses Manly's α because IPA hinges on scaling the preference measures to those you would expect if all prey types were present in each site (Step 1). The scaling requires that prey preferences range from 0 to 1, and sum to 1 in each site-and Manly's α is the only published index that does this.
However, Manly's α has a weakness in that it is sensitive to variations in densities of rare prey. IPA minimizes this problem in two ways: first, mean preference (Step 3) is weighted by prey density, and second, when estimating sampling error during bootstrapping, the preferences are transformed by arcsine (2 x − 1).
On the other hand, Manly's / is useful when predicting total biomass of prey consumedone simply multiplies Manly's / by prey density and body mass. In addition, Manly's / is biologically relevant in that it represents the relative probability of eating a prey once it is encountered [23].

Availability biases
When organism densities are estimated, several availability biases can occur, and these can potentially affect preference estimates. False absences occur when individuals are present but undetected, and false presences occur when individuals are not present but are wrongfully identified to be present [46]. If such biases underestimate prey availability, then preferences are overestimated. However, IPA is only affected indirectly by such biases, since it uses estimates of prey availability, but does not specify how those estimates should be obtained. Thus, prey availability estimates can be made using one of the many density estimation methods that minimize such biases [37].

Ecological differences among sites
If important ecological differences exist among sites, then this might affect the accuracy of overall estimates of prey preference, and thus of conservation interventions. There are two ways of dealing with this. First, find out if preferences do differ among sites, by comparing standardized preferences within each site (i.e. step (iv)) to overall preferences. Currently this would be done visually, and thus work is needed to develop statistical tests for this. Sites that are different would then be treated separately. Second, instead of analyzing preference based on prey species, one could use prey weights. Many predators select prey mostly based on size, not prey species [27]. The lion example discussed above illustrates this.

Applications
IPA can be generalized. In the description of IPA, we use the term "prey types"; prey can be categorized in many ways. For example, Lungdoh et al. [44] classified prey using broader taxonomic units than species. Or one might use some features of the prey, such as size or microhabitat use. These features could be revealed using IPA. Then, one could classify prey according to that feature and then use IPA to generate final estimates of overall preference. We can also generalize the resource; IPA can be used for any type of resource, such as habitats or nest sites.
In studies of predator-prey relationships, IPA can also be used to estimate overall prey consumption. Many of the key studies of predator-prey relationships have focused on systems with a limited number of main prey types (for example, the snowshoe hare (Lepus americanus)-lynx (Lynx canadensis) system [47] and the wolf (Canis lupus) -moose (Alces alces) system [48]), because it is difficult to estimate overall prey consumption when there are many prey types. When researchers have studied systems with many prey types, they have had to make simplifying assumptions. For example, Lindsey et al. [49] compared cheetah (Acinonyx jubatus) prey requirements in 12 South African game reserves in order to manage re-introductions of cheetah. To combine data from many sites, the researchers had to focus on only the two most important prey types. Using IPA, we can estimate relative prey preferences for all prey types and thus combine them into one overall measure of prey consumption. Also, if we include prey weight, we can estimate overall biomass of prey consumed. This flexibility will make it easier to build predator-prey models with many varied prey types.
One can use IPA to test hypotheses about factors that affect prey preference. The type of test depends on whether the factors are inherent properties of the prey types or ones that can vary among sites. For example, some properties that are inherent are handling time, risk of injury during capture, taste and size of the prey [41]. Some properties that vary among sites are types of habitats, and prey abundance and diversity [41]. Using IPA, we can test for the inherent properties by comparing overall estimated preferences among different prey types. A strength of using IPA to do this is that estimating preferences over many sites ensures that effects of the varying properties should average out. Thus, the more, and different, sites, the better.
We can test for the varying properties by analyzing the differences between estimated and observed preferences for each prey type and site. IPA preferences are estimated under the assumption of constant preferences among sites, and thus can be viewed as a null hypothesis. Models can be tested predicting the differences between estimated and observed preferences as a function of prey density, prey diversity, or various habitat features. This allows us to test hypotheses that had previously been limited to a narrow range of study sites. For example, Prugh [7] showed that coyotes (Canis latrans) change prey preference in response to changes in snowshoe hare (Lepus americanus) density. Such comparisons were possible because Prugh worked within one study site over several years, with a relatively constant number of prey types. Such an analysis can now be carried out using IPA over a wide range of study sites that differ in prey types. As of July 2019, there are at least 109 studies of coyote feeding habits that could be used in such a comparison.
Application of IPA allows us to: test hypotheses about what factors affect prey selection, predict preferences in new sites, and estimate overall prey consumption in new sites. The method's robust flexibility could lead to a general theory of feeding preference that will allow us to understand and predict food choice.
Supporting information S1 Appendix. Using IPA to estimate the Jacob's preference index. (DOCX)